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The traditional use of scores derived from 
standardised, norm-referenced achievement tests is examine! as to itB 
role in a local school system *s efforts to monitor itself and to 
examine how well individual students, schools and the overall system 
are functioningo Several questions regarding the legitimacy of 
well-established practices involving test use are raised- These 
questions include: {^) how much and what kind of information do 
norm-referenced standardized tests really provide for program 
managers and school administrators, and (2) what are the real 
differences between an achievement test and an ability test in what 
they measure and how they should be used? It is concluded that the 
use of standardized tests should be closely examined, and that 
alternative methods are needed for measuring achievement expectaacy 
of students as well as schools, (Author/GK) 
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B^^kground 



Testing plays a ma jc r rc le .r.i a local school syrrcer 's ef£o rs lc monitor 
itself and to examinr nov veil individual stud.nts, schools, ^ r?.--^ overall 
system is functioning In t, many assessors of quality edu. .i' rr: use test 
results as the major arcmerer of s_.ccess or failure ci :bli -ic^t:ion and 
place the number derived Ircm a ser es of well filled b„DjIe£ /r above any 
other index of how well children a: a being taught. In zhe szhc ^ system in 
which I work, Montgomery County I ,ryland, we (those people in charge of 
administering and interpreting zesl data) have for th: past two years been 
doing some serious soul iiaarcning :sgarding standardized rs_ . scores and the 
information they provide. We havv looked at the use tc whuh the tests have 
been put and come Co pose soine ver serious questions regarciT:i: the legitimacy 
of well established practices. 

The questions we have been asking fall into two general categories. First, 
how much and what kind of information do norm referenced standardized tests 
really provide for program managers and school administrators? And, second, 
what are the real differences, if any, between an achievement test and a 
so-called "abilities" test in what they measure and how they should be used? 

Uses of Tests 

In order to understand more fully our concerns and why these questions have 
arisen, it may be helpful to look at the ways in which test data have been 
used in Montgomery County. In all likelihood, our practices can be considered 



re£.s /rably typica' of pz^mizes in majc; ::;-hool systems nationwide. 
Sts7::dsrdized nom eferenced z-r.iHrErtr.rrit te are given as part of a 

sta^ev-^de testing ogram. Ac::-, i^e—e- : tests r- given t: j all students in 
Grsde. 3, 5, and Until xhis :-c-:r: adniinis tra:::ii::.n of an .ibilitics test was 

also required at same s:rH±e Now, -in the eyes of the state, 

abilities testing s orsidered :Tpti.:TT3l. 

The traditional u es of 5cc:res der ived frirm the^^e rests fall in :o four general 
categories — Fvaluati::g the s- lis ^nd reed of individual students, 
evaluating schools, evaluacin^: insi .. ional pre grams , and evaluating the 
school system. These are discusse^d br fly below: 



1. Evalua ting the skills and neeur :.i i-:::dividual , lui^ents . 3n the individual 
level, norm referenced test err—, i-rrrr had several uses, 

o Individual diagnzsis am: ramming ~ to provide staff with 

information v?hich supplerzeni: j:rades and professional judgement, to 
be used in determinizi'^ incrridual needs and suggesting activities 
from which a student iniii^hi iiHiri^fit. Comparisons between achievement 
and abilities test per fcmnuiiiii nave been considered by staff to be an 
important indicator ::: .^::::ii:i:::f3Ctory progress, '*over" or "under** 
achievement . 

o Communication with parents - to inform parents about the educational 
attainments of their children. Comparisons between achievement and 
abilities test scores have provided one indicator of student 
motivation, as well as the degree to which a school is meeting a 
student's needs. 
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o Screening for special programs - to help staff in selecting students 
for .special programs, such as gifted and talented and advanced 
placement courses. Abilities test performance, in particular has 
until very recently, played a major role in deciding among students 
who otherwise appear to have equivalent qualifications. 

o Grouping - to place students in special classes or groups within 
classes vjhich differ, in content, level, or pace of instruction. 

Evaluat ing school s ■ There are two general ways in which standardized test 
data have been used to evaluate schools. 

o Assessing individual school performance - to determine whether or not 
individual schools are providing quality instruction which meets the 
needs of its student population. As in the case of individual 
diagnosis, comparisons have frequently been made between school 
ability and achievement test scores to determine whether or not the 
school, and by inference the principal, was functioning at, above 5 or 
below what might be expected given the ability level of the enrollees. 

o Comparing performance among schools - to determine which schools are 
the best and to rank schools vis a vis each other in terms of their 
academic accomplishments- (The real estate agent's dream). 
Sometimes abilities test scores have been used as a kind of ^'control 
variable*' in making such comparisons. 



3. Evaluating the effectiveness of instructional programs - to determine 
whether different approaches to teaching a subject, such as reading, are 
differentially effective. As described above in evaluation designs 
abilities test scores have sometimes been used as a covariate. 

4. Evaluating school syst e m quality - to determine whether or not the syFten 
as a whole is providing a quality educational prograir. for its sCuder.ts. 
Here, differences between systemwide abilities and achievement test scores 
(or lack there of) have again been used as a standard against which to 
measure performance. In many cases, in making such comparisons the 
district superintendent is the benefactor of either the praise or the 
blame which accrues from this activity* 

While this list is lengthy, it is likely that it is under rather than over 
inclusive. 1 think its fair to say that information from standardized norir 
referenced tests is used extremely widely and effects decision making at all 
levels of local educational systems. 

T The Problem 

Why do we question these uses? Three major concerns will be discussed here. 
And, up front I want to point out that, to some extent, these concerns stem 
from misinterpretation or misuse of test data by well meaning believers in the 
power of the "objective" quantitative approach and are not fault of the tests 
themselves or the test developers/publishers. To differing degrees, however, 
I feel it is fair to say that the test marketers and their written materials 
encourage such uses* 

D 
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The fi—nt cor_..i:.£rm is th. :::rny of these practices involve sing data from 
tests i Timari^y intended provide data on groups ^ o mak? decisions iibout 

indivi >. stuc .nts. Trie rests are not only group a iiriinijccr but dc -gned 
to be r St reliable v/he u.r^d to measure group performar.c We c ^tstion 
whether or not the score ~ individuals are sufficientl" a: ::-.:ri the 
uses tc nich the ar' *: " out. For example, given the r:. _ . „r of 

the ter can real. say that a student with a score • :hi 95th 

percent!., is jr^irior " ame with a score at the 93rc ver 90th 

percentile? Car £ cut sc~': be justified? 

The second conr^-r. Is whezr.er there is in fact any useful difi^r-ence etveen 
what is measure 7- abiL:.::^ and achievement tests and whether praciice of 

comparing score :r the :i;o measures, as was mentioned in iveral of the 
examples above, -aiakes seTise- This ir> clearly not a new issue in the are^ of 
test usage, I- is one unat remains hotly debated, however, as the recom- 
mendation that lities .ests be used as a standard against which to measure 
achievement tes:: perfonnance continues to be made. This has become a very 
emotionally chznrged area. Further, there is so much disagreement among 
experts that the potential for debate seems almost endless. For example, at a 
work session before our Board of Education last spring out of ten experts our 
department assembled we managed to have five who endorsed the practice and 
five who did not. This hardly provided us with convincing support for 
rejecting a practice with a long history in our county. Nonetheless, we 
question whether or not the tests differ more than two so-called achievement 
tests and are very uncomfortable with setting one as the basic standard 
against which to examine performance on the other. 



The third concern is that in . -neral we feel tl:at the public and ecj^^zatcrs 
tend to overrate the informati provided by standardized test scores. Thrre 
is something sedu :ivs about apparent simplicity a^.id oojecCivit; c:. a 

number derived fr.- a paper anc pencil test. How ver, the kinds of decisions 
for which ley used are in iact vary comp licaced. Standardize. test 

scores are ' ■ 3ne of many fzrto- which should be taken into acc::-nt in 
drawiiDg cc :ns» Unfortun:i-ei ihis is all too infrequently the case. 

The public::^: f school by szhoo . . -st scores always makes headlines in the 

local papcrr. 7^ leads parents a d :he public to draw rapid and sometimes 
quite inacr-_r£ nferences about ""he ol /principal performance* 



lut ions 



We have ' md —at pointing out : ese problems and raising these questions 
does not eadily lead to consen- s or modification of practice* For policy 
makers (Icard of Education niembe:.,, administrators, etc) it is not sufficient 
to say tnat a practice is invalid, especially a practice that has proven 
useful. An alternative solution imust be offered. 



We have spent a good deal of time over the last year trying to find some 
alternative solutions. If I had to give our efforts a grade, I think a "B** 
would be considered fair- We've done pretty well, for example, in the area of 
student selection for special programs. Strict ranking by test scores is now 
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officially discouraged (although not eliminated in practice) and the 
importance placed on scores on so-called abilities tests :ver other measures 
is decreased. In time we may e^en see more emphasis plac-ed on work sample: 
and other more d; -ect measures of skill level in a specific 3rea, 

We have introduced a different way to look at whether act schools are 

effective- Specifically, we now do comparisons of lon^ i . jciinal data for 
s tudents tested in the same school at more than one grade I vel (e • g, in the 
third and fifth grade for elementary schools). We look tc ee whether there 
is any "substantial" difference in test, performance (comp. ir^ or subtest) for 
the group between the two test periods. To reach this ju .i^ent we assume t; at 
all other things being equal students would be expected :o rank similarly at 
the two test pcints- In ather words, the best predictor of future performance 
is current performance. If performance across the grades differs b-, a 
specified amount (+7 NCE points or a 1/3 of a standard deviation from the 
county trend), it is considered an indicator that the school may be especially 
effective (or ineffective) in the area the test is measuring. It should be 
noted that in making judgements about the performance of individual schools 
the countywide trend is considered because it is important to guard against 
attributing to a school strengths or weaknes ses that in reality relate to the 
countywide curriculum. Thus, if the county trend were to increase from 80 
NCEs to 82 NCEs a gain of a 2 NCEs, a school would have to show an increase of 
10 NCEs to be considered effective. 

This way of assessing school effectiveness seems to us to be far better than 
comparing performance on achievement and so-called abilities tests. However, 
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we are aware that it ^rc.* -as shortcomings. One could argue that ^ 7 NCEs is a 
rather arbitrary figure vnd that there is no convincing basis for selecting 
that criteria over -Lhi rj. Regression to the mean may be occurring for our 
extreme scoring school^., zind we are not quite sure as yet how to take this 
into account. We nrre : convinced that regression analyses totally solve our 
problem. In addition jr.der these criteria if a principal wanted to manipulate 
the system, she cculd place her weakest teachers in the third grade and her 
s tronge s t teachers in the fourth and fifth grades. 

For these reasons we stress that this analysis provides an indicator of 
whether a school may be more or less effective and trends across multiple 
years should be examined- We try to emphasize that it is a way of flagging 
schools for further study by professionals more familiar with curricula and 
instruction. We are encouraged, however, by the potential of this approach 
especially since some of the schools which are flagged include ones not 
typically cited as being outstanding where traditional school ranking methods 
(ranking according to a single year's performance) are used. Specifically, 
for the first time, schools in the highest SES areas are showing up as having 
academic problems and schools in relatively lower SES areas are identified as 
having noteworthy programs. 

We have not totally succeeded in convincing people that this approach is 
better than one which compares achievement to ability and it clearly is not a 
direct substitute. It does not answer the question of whether a school or a 
child is doing as well as he/she should. However, we are unconvinced that we 
have or ever have had a valid measure of this expectation. Unfortunately, 
this opinion is not shared by some very important policy makers. 
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Conclusions 

School districts continue to struggle with the problem of interpreting and 
using standardized test data. While scor^s on standardized norm referenced 
tests can be very helpful when used appropriately as a decision-making tool, 
the same data, when misused, can do extreme damage to individuals and 
institutions • While we want to continue to examine the use of standardized 
tests for all the purposes described earlier we are especially concerned about 
the u'se of abilities tests, and answering questions regarding what they 
.measure, how, and, if, they should be used in ways which differ from uses of 
achievement tests. We are also looking for suggestions for alternative 
methods for measuring expectancy or how well a school or child ''should" be 
doing if such exist. Any light that this panel sheds on these issues will be 
greatly appreciated. 
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